Integrating Cyberhaven with SIEM Tools Using Scripts (Legacy API v1)

This integration method uses the legacy API v1. If you are setting up this integration for the first time, use the workflow described in Cyberhaven Integration Workflow Best Practices Guide .

Incidents can be retrieved by running a script. There are two script files, start_monitoring.sh and siem-monitor-incidents.py .

The start_monitoring.sh uses Cyberhaven's authentication token to call siem-monitor-incidents.py which uses the API endpoint,

/api/rest/v1/incidents/list to poll for incidents in Cyberhaven's dashboard. The list of incidents is displayed in your terminal window.

You can configure your SIEM tool to ingest the output from your standard terminal window.

Using the script to poll for incidents

To run the script, start_monitoring.sh you need an API token and the hostname from your Cyberhaven tenant URL.

1. Login to the Cyberhaven dashboard and go to Preferences > API token management to create an API token.

2. Copy the API token to a notepad. Also, copy the hostname from your Cyberhaven tenant URL to the notepad.

Example

If your Cyberhaven tenant URL is

https://mycompany.cyberhaven.io/, then your hostname is

mycompany.cyberhaven.io.

3. Create a local folder and copy the two scripts, start_monitoring.sh and siem-monitor-incidents.py into the folder.

start_monitoring.sh

Copy the following text into a notepad, enter the API token and hostname from step 2, then save the file as start_monitoring.sh to the folder you created.

PowerShell \#\!/usr/bin/env bash \# this script polls incidents at the specified interval and outputs results to console \# python script is a single-run script, you must manage intervals and failures externally AUTH="eyJlbW......." HOST="mycompany.cyberhaven.io" INTERVAL\_SECONDS=1 while python3 siem\-monitor\-incidents.py $AUTH $HOST; do sleep $INTERVAL\_SECONDS; done

siem-monitor-incidents.py

To obtain the Python script, you can download the latest version:

Bash	Copy
curl https://storage.googleapis.com/files.cyberhaven.io/su pport/siem-monitor-incidents.py --output siem monitor-incidents.py

Move the resulting siem-monitor-incidents.py to the same folder as start_monitoring.sh . The script is also available below for reference.

Python Copy # Siem script

This script will log alerts about incidents found by Cyberhaven. You can use it to connect Cyberhaven

Lightbeam with your SIEM.

You can run this script as such:

bash script,
cron job,
scripted source

https://docs.splunk.com/Documentation/SplunkCloud/8.0.0/AdvancedDev/ScriptSetup

Before using the script, you should:

Add the auth token into start_monitoring.sh script.
Point HOST variable to your Cyberhaven hostname.
Start the script as bash script using

./start_monitoring.

Customization

You might want to run the script as cron job or as scripted input in splunk.

In this case you have to run your script like

python3 siem\-monitor\-incidents.py $AUTH $HOST

Replaying custom events

To replay custom events you have to pass additional parameter to python script '--replay-old', it will replay all the pre existing events and output them to console

python

import argparse

import base64

import copy

import json

import logging

import os

from datetime import datetime, timedelta

import requests

parser \= argparse.ArgumentParser(

description\='Fetches Cyberhaven incidents and

logs them')

parser.add\_argument("auth", help\="base64 encoded

credentials", type\=str)

parser.add\_argument(

"host", help\="hostname format \`example.com\`

without protocol", type\=str)

parser.add\_argument(

"--replay-old", help\="replay already existing

incidents", default\=False, action\='store\_true')

parser.add\_argument(

"--no-verify", help\="allow insecure requests",

default\=False, action\='store\_true')

args \= parser.parse\_args()

AUTH\_PATH \= "/user-management/auth/token"

ALERTS\_PATH \= "/api/rest/v1/incidents/list"

AUTH \= args.auth

HOST \= args.host

API\_PATH \= "https://"\+HOST

REPLAY\_OLD \= args.replay\_old

\# verify ssl certificate

VERIFY \= not args.no\_verify

nowTimeISO \= datetime.utcnow().isoformat() \+ "Z"

\# if we replay old events then start set day to

something in the past

if args.replay\_old:

nowTimeISO \= "2001-01-01T00:00:00Z"

\# incidents default request data

incidents\_request\_data \= {

\# amount of entities to fetch

"page\_size": 10,

"sort\_desc": False,

"sort\_by": "event\_time",

"filters": {

"times\_filter": {

"end\_time": "2090-09-01T23:59:59Z",

"start\_time": nowTimeISO

}

}

}

logging.basicConfig(level\=logging.INFO)

def authenticate\_request(base64\_creds):

"""request to fetch auth token"""

creds\_json \=

base64.decodebytes(bytes(base64\_creds, 'utf8'))

creds \= json.loads(creds\_json)

r \= requests.post(url\=API\_PATH \+ AUTH\_PATH,

verify\=VERIFY,

data\=creds, headers\={'HOST':

HOST})

r.raise\_for\_status()

return r.content.decode("utf-8")

def incidents\_request(token, start\_time, size,

search\_after \= ""):

"""

request to fetch incidents

token is authentication token returned by auth

request

page\_id is pagination cursor

"""

request\_data \=

copy.deepcopy(incidents\_request\_data)

request\_data.update({'page\_id': search\_after, 'page\_size': size })

request\_data\["filters"\]\["times\_filter"\]

\["start\_time"\] \= start\_time

auth\_header \= 'Bearer {}'.format(token)

r \= requests.post(url\=API\_PATH \+ ALERTS\_PATH, json\=request\_data,

verify\=VERIFY,

headers\={'authorization':

auth\_header, 'content-type':

'application/json;;charset=UTF-8'})

if r.status\_code \== 401:

print("Not authenticated")

if r.status\_code \>= 400:

print(r.text)

exit(0)

json\_data \= r.json()

records \= json\_data.get("incidents", None)

return {

'alerts': records,

'next\_page\_id': json\_data.get('next\_page\_id',

'')

}

def format\_event(event):

source\_edge \= event\['edge'\]\['source'\]

destination\_edge \= event\['edge'\]\['destination'\]

eventid \= event\['id'\]

search\_name \= "Dataset: {}, Category:

{}".format(event\['category'\]\['name'\],

event\['dataset'\]\['name'\])

file\_name \= 'none'

if 'path\_basename' in source\_edge.keys(): file\_name \= source\_edge\['path\_basename'\]

source\_location \= source\_edge\['location\_outline'\] source\_type \= source\_edge\['location'\]

destination\_location \=

destination\_edge\['location\_outline'\]

destination\_type \= destination\_edge\['location'\]

date \= destination\_edge\['local\_time'\]

user \= destination\_edge.get('local\_user\_name',

'none')

event\_type \= destination\_edge\['event\_type'\]

hostname \= 'none'

if 'hostname' in destination\_edge.keys():

hostname \= destination\_edge\['hostname'\]

return json.dumps({

'id': eventid,

'search\_name': search\_name,

'file\_name': file\_name,

'source\_location': source\_location,

'source\_type': source\_type,

'destination\_location': destination\_location,

'destination\_type': destination\_type,

'date': date,

'user': user,

'event\_type': event\_type,

'hostname': hostname

})

def log\_events(events):

for event in events:

data \= format\_event(event)

print(data)

class LastIncidentObjectPersisted:

current \= ''

file \= None

def \_\_init\_\_(self, path):

self.path \= path

mode \= 'r+' if os.path.exists(path) else 'w+'

self.file \= open(path, mode)

self.current \= self.file.read()

def get(self):

"""return persisted value"""

strippedValue \= self.current.strip()

data \= strippedValue.split("\_", 1\)

search\_after \= ""

start\_time \= ""

if len(data) \>= 1:

start\_time \= data\[0\]

if len(data) \== 2:

search\_after \= data\[1\]

return {

'search\_after': search\_after,

'start\_time': start\_time

}

def set(self, start\_time, search\_after):

"""set value to file"""

formatedValue \= "{}\_{}".format(start\_time,

search\_after)

self.file.seek(0)

self.file.truncate(0)

self.file.write(formatedValue)

os.fsync(self.file)

self.current \= formatedValue

one\_ms \= timedelta(0, 0, 1000\)

def parse\_nano\_date(str):

arr \= str.replace("Z", "").split(".")

d \= datetime.fromisoformat(arr\[0\])

if(len(arr) \> 1):

ms \= int(arr\[1\]\[0:6\])

d \= d.replace(microsecond\=ms)

return d

def main():

token \= authenticate\_request(args.auth)

script\_dir \=

os.path.realpath(os.path.join(os.getcwd(),

os.path.dirname(\_\_file\_\_)))

last\_incident\_id\_store \=

LastIncidentObjectPersisted(os.path.join(script\_dir, './incidents.state'))

persisted\_value \= last\_incident\_id\_store.get()

if persisted\_value\['start\_time'\] \== "":

last\_incident\_id\_store.set(nowTimeISO, '')

persisted\_value\['start\_time'\] \= nowTimeISO

events \= incidents\_request(token,

persisted\_value\['start\_time'\], size\=30,

search\_after\=persisted\_value\["search\_after"\])

log\_events(events\["alerts"\])

if events\["next\_page\_id"\] \== '' and

len(events\['alerts'\]) \> 0:

\# take date, remove Z at the end and add one

second and then bring back Z at the end

\# api and python iso formats are different

\# in case we reached end of the list we want

to store time of last event and use it in order to continue pagination

d \= parse\_nano\_date(events\['alerts'\]\[\-1\]

\['event\_time'\]) \+ one\_ms

result\_date \= d.isoformat() \+ "Z"

last\_incident\_id\_store.set(result\_date,

events\["next\_page\_id"\])

else:

last\_incident\_id\_store.set(persisted\_value\['start\_tim e'\], events\["next\_page\_id"\])

if \_\_name\_\_ \== '\_\_main\_\_':

main()

Open a terminal window, navigate to the folder containing the two scripts and run ./start_monitoring.sh .

The script returns a list of incidents based on the polling interval defined in start_monitoring.sh . Each incident contains a unique ID of the incident, the alert name that generated the incident, and metadata for the source and destination of the flow of data that generated the incident.

Script Output Field Descriptions

The table below lists all the incident fields that the script output displays in the terminal.

Fields	Description
id	An identifier that was assigned by Cyberhaven for the incident.
search_name	The dataset and policy name.
file_name	The file name of the flow that triggered the incident.
source_location	The type of source location of the matching data flow, for example, endpoint, website, etc.
source_type	A short outline of the source location that triggered the incident such as node, hostname for endpoint, email for cloud, device name for removable media.
destination_location	The type of destination location of the matching data flow, for example, endpoint, website, etc.
destination_type	A short outline of the destination location that triggered the incident such as node, hostname for endpoint, email for cloud, device name for removable media.
date	The date of the event that triggered the incident.
user	The username that triggered the incident.
event_type	The type of event that led to data arriving at / leaving from a location.
hostname	The hostname of an endpoint/share where data resides.

Metadata Field Descriptions

The table below lists all the metadata we collect for the source and destination.

Fields	Description
Data
data.app_commandline	The command line that started the application for accessing the data.
data.app_description	The description of the application used to access the data.
data.app_main_window_title	The title of the main application window.
data.app_name	The name of the application used to access the data.
data.app_package_name	The package name for Modern Windows applications.
:----	:----
data.blocked	A flag indicating whether the event was blocked.
data.browser_page_domain	The domain extracted from the referrer URL.
data.browser_page_title	The page title of the website
data.browser_page_url	The referrer URL.
data.category	The type of website, for example, webmail, social media, etc.
data.cloud_app	The type of cloud application, for example, OneDrive, SharePoint, Google Drive, etc.
data.cloud_app_account	The user account used to log in to the website.
data.cloud_provider	The type of cloud provider storing data, for example, Office 365, Salesforce, etc.
data.comment	The device description, such as a printer as provided by the admin user.
data.content_uri	For email retrieved by the cloud sensor, the URI of the attachment in O365 servers.
data.data_size	The size of a piece of data, for example, when copy and pasted.
data.device_physical_location	The physical location such as, the building or room where the device is stored. Examples of devices are printers, scanners, etc.
data.domain	The domain name, in the form .sub.domain.tld.
data.domain_components	The domain name split into domain components.
data.driver	The printer device driver, if applicable.
data.email_account	The email address that identifies the mailbox where the data resides.
data.email_groups	The geographic location of the email account.
data.email_subject	The subject of the email where the data resides.
data.endpoint_id	The identifier for the endpoint where the event was generated.
data.event_type	The type of event that led to data arriving or leaving a location.
data.extension	The file type or the extension of the file.
data.file_size	The size of a file in bytes.
:----	:----
data.group_name	The list ofActive Directory groups to which the user accessing the file belongs.
data.hostname	The hostname of an endpoint or share where the data resides.
data.job_name	Print job name.
data.local_file_id	The ID of a file on an endpoint/network share/cloud storage. For local files on endpoints/shares, the file ID is obtained by concatenating the Volume Serial Number and the File ID provided by the file system, in the form volume_serial_number;file_id.
data.local_groups	The list ofActive Directory groups to which the user accessing the file belongs.
data.local_machine_name	The hostname of the machine where the event happened.
data.local_time	The time when the data arrived in the silo.
data.local_user_name	The username of the user accessing the data.
data.local_user_sid	The SID of the user accessing the data.
data.location	The type of location where data resides, for example, endpoint, website, etc.
data.location_outline	A short outline of the location that triggered the incident. For example, node, the hostname of the endpoint, email, and device name of removable media.
data.md5_hash	The MD5 hash of a file at a location.
data.media_category	The type of removable media.
data.path	The path where the data being accessed resides or, in the case of EndpointApps, where the application resides.
data.path_basename	The file name retrieved from the file path.
data.path_components	The path to a file split in a sequence of folders.
data.port	The device port name. For example, if the device is a printer, then the port names are LPT, USB, etc.
data.printer_name	The name of the printer.
data.quick_xor_hash	The quick XOR hash of a file at a location.
data.removable_device_id	The unique ID of the removable device.
-----	:----
data.removable_device_name	The name of the removable device.
data.removable_device_product_id	The 16-bit number assigned to specific USB device models by the manufacturer.
data.removable_device_usb_id	The unique hardware ID of the removable media.
data.removable_device_vendor_id	The 16-bit number assigned to USB device manufacturers by the USB Implementers Forum.
data.salesforce_account_domains	Salesforce domain name from user’s email address.
data.salesforce_account_name	Name of the Salesforce account.
data.url	The exact URL used to access the data.
Category
category.dataset_ids.exclude_origin	(For review) When “true” this field indicates that “Exclude flows to datasets origin” is enabled in the policy.
category.dataset_ids.name	Name of the policy that matched the activity.
category.rule	Object for category rules configuration.
category.rule.allow_request_review	When “true” this field indicates that “Allow the user to request a policy review” is enabled in the response message of the policy.
category.rule.create_incident	When “true” this field indicates that “Create an incident” is enabled in the policy.
category.rule.incident_action	The action to be taken when a policy match occurs such as, Warn, Block, or Nothing.
category.rule.notify_email	The email address to which the incident notification is sent.
category.rule.notify_enabled	When “true” this field indicates that “Send email notifications” is enabled in the policy.
category.rule.override_enabled	When “true” this field indicates that “Allow user to override blocking” is enabled in the response message of the policy.
category.rule.require_justification	When “true” this field indicates that the user is required to provide a justification.
category.rule.should_ack_warning	When “true” this field indicates that the user is required to acknowledge a warning.
category.rule.show_logo	When “true” this field indicates that a logo was uploaded to be shown in the response message.
:----	:----
category.rule.show_title	When “true” this field indicates that “Show the dialog title” is enabled in the response message of the policy.
category.rule.status	The risk level assigned to the data that matches the policy such as, High, Low, or Trusted.
Dataset
dataset.name	Name of the dataset that identifies the category of data that matched the policy.
File
file	The name of the original file for the data flow that triggered the incident.
Incident Reactions
incident_reactions	The user’s reaction to an incident or a response message if applicable.
Incident Response
incident_response	The policy’s response to a user action that triggered the incident. For example, Warning shown, Provided an explanation.
Resolution Status
resolution_status	The incident resolution status that was set by an administrator or operator such as Assigned, Ignored etc.

Integrating Cyberhaven with Splunk Using the App (Legacy API v1)

NOTE

The Integrations feature replaces this legacy solution. If you are setting up this integration for the first time, use the workflow described in Cyberhaven Integration Workflow Best Practices Guide.

Splunk integration enables you to monitor incidents generated by Cyberhaven Sensors in your Splunk dashboard. To integrate with Splunk you must,

Download the Cyberhaven app from Splunk Apps.
Authorize Cyberhaven in Splunk to receive incident data.

Download the Cyberhaven app

Login to your Splunk instance: on-prem or cloud app.
Click on “Find More Apps”, in the left panel.
Search for Cyberhaven Splunk application and click Install.

Authorize Cyberhaven in Splunk

When the installation is complete, go to the Search tab and enter a new search, index="cyberhaven" to search for the app. On the Inputs page, click Create New Input.
In the Add Incidents script dialog box, provide the input parameters. The following image is an example script.
To get your Auth token and Host url,

a. Browse to the following URL.

https://\<instance-name\>.cyberhaven.io/api-tokens

b. Click on “CREATE NEW TOKEN” and give it a name (for example, splunk integration as shown in the screenshot below). c. Click on “CREATE”. Cyberhaven creates an API token which is the Auth Token.

d. Copy the token to the clipboard and close this window.

Go back to the Add Incidents script dialog box in your Splunk instance and do the following:

a. Enter the host URL as <your-instance-name>.cyberhaven.io. b. Paste the Auth Token you copied in the previous step.

c. Click Add. The new data input is displayed on the Inputs page.

Go to the Search tab. In the New Search text box, search for the following:

index="cyberhaven"

Alist of all the incidents from the Cyberhaven Console will now be available on your Splunk dashboard.

Viewing Incidents in Splunk

Log in to the Splunk dashboard to view all the incidents generated on the Cyberhaven Incidents page. New incidents from Cyberhaven are automatically added to the Splunk dashboard. You can refresh the page or run the search query to get the latest list of incidents.

Since the Cyberhaven Splunk application can process a high load (like million incidents per minute), incidents are displayed within a minute after they are displayed on the Cyberhaven Incidents page.

Incident Severity

Aseverity rating is assigned to each incident based on the severity rating you chose for the Cyberhaven policy. Starting with Cyberhaven Console v23.03, policies have additional severity ratings. Cyberhaven assigns a value to each rating that will be used to calculate the risk score in the upcoming risk management tools.

Policies have the following five severity ratings.

Severity Rating	Value
Critical	4
High	3
Medium	2
Low	1
Informational	0

In the Splunk dashboard, the severity ratings of the incidents are classified as "risky" for High severity incidents and "low_risky" for Low severity incidents. With the addition of new severity ratings in the Cyberhaven Console v23.03, incidents are generated for all severity levels.

NOTE

Due to the policy severity rating enhancements, newly reported incidents on the dashboard will be assigned a severity rating of "risky".

Incident fields

The incident fields in the Splunk dashboard include all the fields that can be exported from the Cyberhaven Incidents page as well as the fields defined in the Common Information Model (CIM).

Read more: https://docs.splunk.com/Documentation/CIM/4.2.0/User/Alerts Cyberhaven sends the following list of CIM fields to Splunk.

'app': format_value(destination_edge.get('app_name')),

'body': format_value(content),

'dest': format_value(destination_location),

'dest_category': format_value(event.get('category').get('id')), 'src': format_value(source_edge.get('hostname')),

'src_bunit': format_value(source_edge.get('location')),

'src_category': format_value(source_edge.get('event_type')),

'src_priority': format_value(source_edge.get('event_type')),

'subject': format_value(event.get('category').get('name')),

'tag': format_value(",".join(event['content_tags'] or [])),

'type': format_value(destination_edge.get('event_type')),

Additionally, the following fields from the Incidents page are sent to Splunk.

None Copy

assignee

resolution_status (filter: unresolved)

dataset_name

category_severity

category_name

user

file

content

content_tags

response

user_reaction

explanation

resolved_by

timestamp UTC

reaction_time UTC

resolution_time UTC

app_name

app_main_window_title

app_package_name

app_description

app_command_line

blocked

browser_page_url

browser_page_domain

browser_page_title

content_uri

cloud_provider

cloud_app

data_size

domain

domain_category

event_type

endpoint_id

email_account

email_groups

destination_file_path

source_file_path

file_extension

file_size

group_name

hostname

location

local_time UTC

local_user_name

local_user_sid

local_groups

local_machine_name

media_category md5_hash

salesforce_account_name

salesforce_account_domains

printer_name

removable_device_name

removable_device_vendor_id

removable_device_product_id

url

Limitations

If incidents do not appear right away on the Cyberhaven Incidents page, they will not appear in the Splunk dashboard either.
The cloud version of the Splunk App takes longer to update compared to the desktop version of the Splunk App. The desktop and cloud versions of the Cyberhaven Splunk application are available here.

Troubleshooting

If no events are showing up in your Splunk app, then check that you

entered the correct Auth Token. If the token is incorrect, then remove the Cyberhaven Splunk application, download, and set up the app again. 2. It is possible to search for debug logs locally. In the Splunk UI, go to the Search tab and enter the following search queries in the New Search text box.

None	Copy
index="_internal" "incidents" index="_internal" "siem"

Customization​

Replaying custom events​

Script Output Field Descriptions​

Incident Severity​

Incident fields​

Troubleshooting​